Search CORE

2 research outputs found

Recommended from our members

Anonymisation of geographical distance matrices via Lipschitz embedding

Author: AS Whittemore
BS Everitt
CD Lloyd
DR Helsel
G Duncan
GR Hjaltason
GT Duncan
H-W Jung
J Bourgain
J Höhne
J Konc
JJ Trinckes
K Emam El
K Emam El
K Emam El
K Emam El
K Emam El
K Kenthapadi
K Riesen
KC Clarke
KH Hampton
L Sweeney
LA Waller
M Kroll
Martin Kroll
MM Merener
MP Armstrong
MP Gutmann
Rainer Schnell
RS Bivand
S Dray
SC Wieland
T Dalenius
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

BACKGROUND: Anonymisation of spatially referenced data has received increasing attention in recent years. Whereas the research focus has been on the anonymisation of point locations, the disclosure risk arising from the publishing of inter-point distances and corresponding anonymisation methods have not been studied systematically. METHODS: We propose a new anonymisation method for the release of geographical distances between records of a microdata file-for example patients in a medical database. We discuss a data release scheme in which microdata without coordinates and an additional distance matrix between the corresponding rows of the microdata set are released. In contrast to most other approaches this method preserves small distances better than larger distances. The distances are modified by a variant of Lipschitz embedding. RESULTS: The effects of the embedding parameters on the risk of data disclosure are evaluated by linkage experiments using simulated data. The results indicate small disclosure risks for appropriate embedding parameters. CONCLUSION: The proposed method is useful if published distance information might be misused for the re-identification of records. The method can be used for publishing scientific-use-files and as an additional tool for record-linkage studies

City Research Online

Crossref

Springer - Publisher Connector

PubMed Central

Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets.

Author: A McCallum
Adrian P. Brown
Christian Borgs
CJ Bradley
D Karapiperis
D Rosman
D Vatsalan
DP Jutte
E Durham
EA Durham
EL Brook
F Niedermeyer
G Lawrence
GH Shah
IA Binswanger
J Smith
JH Boyd
JJ Trinckes
JMM Evans
M Kroll
M Kuzu
M Kuzu
MA Hernández
MG Maxfield
P Christen
R Schnell
R Schnell
R Schnell
R Schnell
R Schnell
R Schnell
Rainer Schnell
SA McDonald
Sean M. Randall
SM Randall
SM Randall
TG Kristensen
TL Dassanayake
TN Herzog
Z Wan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Background: Integrating medical data using databases from different sources by record linkage is a powerful technique increasingly used in medical research. Under many jurisdictions, unique personal identifiers needed for linking the records are unavailable. Since sensitive attributes, such as names, have to be used instead, privacy regulations usually demand encrypting these identifiers. The corresponding set of techniques for privacy-preserving record linkage (PPRL) has received widespread attention. One recent method is based on Bloom filters. Due to superior resilience against cryptographic attacks, composite Bloom filters (cryptographic long-term keys, CLKs) are considered best practice for privacy in PPRL. Real-world performance of these techniques using large-scale data is unknown up to now. Methods: Using a large subset of Australian hospital admission data, we tested the performance of an innovative PPRL technique (CLKs using multibit trees) against a gold-standard derived from clear-text probabilistic record linkage. Linkage time and linkage quality (recall, precision and F-measure) were evaluated. Results: Clear text probabilistic linkage resulted in marginally higher precision and recall than CLKs. PPRL required more computing time but 5 million records could still be de-duplicated within one day. However, the PPRL approach required fine tuning of parameters. Conclusions: We argue that increased privacy of PPRL comes with the price of small losses in precision and recall and a large increase in computational burden and setup time. These costs seem to be acceptable in most applied settings, but they have to be considered in the decision to apply PPRL. Further research on the optimal automatic choice of parameters is needed

Crossref

Directory of Open Access Journals

espace@Curtin